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Abstract 

In this paper we examine decision problems associated with various classes of convex 
languages, studied by Ang and Brzozowski (under the name "continuous languages"). 
We show that we can decide whether a given language L is prefix-, suffix-, factor-, 
or subword-convex in polynomial time if L is represented by a DFA, but that the 
problem is PSPACE-hard if L is represented by an NFA. In the case that a regular 
language is not convex, we prove tight upper bounds on the length of the shortest 
words demonstrating this fact, in terms of the number of states of an accepting DFA. 
Similar results are proved for some subclasses of convex languages: the prefix-, suffix-, 
factor-, and subword-closed languages, and the prefix-, suffix-, factor-, and subword- 
free languages. 

1 Introduction 

Thierrin [11] introduced convex languages with respect to the subword relation. Ang and 
Brzozowski [2] generalized this concept to arbitrary relations. For example, a language L is 
said to be prefix-convex if, whenever u,w G L with u a prefix of w, then any word v must 
also be in L if u is a prefix of v and v is a prefix of w. Similar definitions hold for suffix-, 
factor-, and subword-convex languages. (In this paper, a "factor" is a contiguous block inside 
another word, while a "subword" need not be contiguous. In the literature, these concepts 
are sometimes called "subword" and "subsequence", respectively.) 

A language is said to be prefix-free if whenever w G L, then no proper prefix of w is in L. 
(By proper we mean a prefix of w other than w itself.) Prefix-free languages (prefix codes) 
were studied by Berstel and Perrin [I]. Han has recently considered X-free languages for 
various values of X, such as prefix, suffix, factor and subword [7J. 
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A language is said to be prefix-closed if whenever w G L, then every prefix of w is also in L. 
Analogous definitions hold for suffix-, factor-, and subword-closed languages. A factor-closed 
language is often called factorial. 

In this paper we consider the computational complexity of testing whether a given lan- 
guage has the property of being prefix-convex, suffix-convex, etc., prefix-closed, suffix-closed, 
etc., for a total of 12 different problems. As we will see, the computational complexity of 
these decision problems depends on how the language is represented. If it is represented as 
the language accepted by a DFA, then the decision problem is solvable in polynomial time. 
On the other hand, if it is represented as a regular expression or an NFA, then the decision 
problem is PSPACE-complete. We also consider the following question: given that a lan- 
guage is not prefix-convex, suffix-convex, etc., what is a good upper bound on the shortest 
words (shortest witnesses) demonstrating this fact? 

The remainder of the paper is structured as follows. In Section [2] we study the complexity 
of testing for convexity for languages represented by DFA's, and we include testing for closure 
and freeness as special cases. In Section [3] we exhibit shortest witnesses to the failure of the 
convexity property. Convex languages specified by NFA's are studied in Section HI We also 
briefly consider convex languages specified by context-free grammars in Section Section [H] 
concludes the paper. 

2 Deciding convexity for DFA's 

We will show that, if a regular language L is represented by a DFA M with n states, it 
is possible to test the property of prefix-, suffix-, factor-, and sub word- convexity efficiently. 
More precisely, we can test these properties in 0(n 3 ) time. 

Let < be one of the four relations prefix, suffix, factor, or subword. The basic idea is 
as follows: L is not <j-convex if and only if there exist words u,w G L, v ^ L, such that 
u < v <j w. Given M, we create an NFA-e M' with 0(n 3 ) states and transitions that accepts 
the language 

{w G L(M) : there exist u G L(M),v & L(M) such that u < v <j w}. 

Then L(M') = if and only if L(M) is <-convex. We can test the emptiness of L(M') using 
depth-first search in time linear in the size of M' . This gives an 0(n 3 ) algorithm for testing 
the <|-convex property. 

Since the constructions for all four properties are similar, in the next subsection we 
handle the hardest case (factor-convexity) in detail. In the following subsections we content 
ourselves with a brief sketch of the necessary constructions. 
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2.1 Fact or- convexity 

Suppose M = (Q, E, S, go, F) is a DFA accepting the language L = L(M), and suppose M 
has n states. We now construct an NFA-e M' such that 

L(M') = {w G E* : there exist u, v G E* such that w is a factor of v, 
v is a factor of u>, and u,w & L,v L}. 

Clearly L(M') = if and only if L(M) is factor-convex. 

Here is the construction of M'. States of M' are quadruples, where components 1, 2, and 
3 keep track of where M is upon processing w, v, and u (respectively). The last component 
is a flag indicating the present mode of the simulation process. 

Formally, M' = (Q', E, 5', q' , F'), where 

Q' = Q x Q x Q x {1,2,3,4,5}; 

% = [9o, 9o, 5o,l]; 

F' = F x (Q-F)x F x {5}; 

1. 5'([p, g , go, 1], a) = {[6(p, a), g , go, 1]}, for all p G Q, a G E; 

2. 5'([p,g ,g ,l],e) = {[p, g , g , 2]}, for all p G Q; 

3- 6'(\p,q,q ,2],a) = {[5{p, a), 5(q, a), g , 2]}, for all p, q G Q, a G E; 

4. <5'([p,g,g ,2],e) = {[p, g, g , 3]}, for all p, q G Q; 

5. g, r, 3], a) = a), 5(g, a), 5(r, a), 3]}, for all p, q, r G Q, a G E; 

6. 5'([p,g,r,3],e) = {[p, g, r, 4]}, for all p, g, r G Q; 

7. 5'([p,g,r,4],a) = {[5{p, a), 5(g, a), r, 4]}, for all p, g, r G Q, a G E; 

8. 5'([p,g,r,4],e) = {[p, g, r, 5]}, for all p, g, r G Q; 

9. 5'([p, g, r, 5], a) = {[<5(p, a), g, r, 5]}, for all p, g, r G Q, a G E. 

One verifies that the NFA-e W has 3n 3 + n 2 + n states and (3|E| + 2)n 3 + (|E| + l)(n 2 + n) 
transitions, where |E| is the cardinality of E. 

To see that the construction is correct, suppose L is not factor-convex. Then there exist 
words u, v, w such that u is a factor of v, v is a factor of w, and u,w G L while v ^ L. Then 
there exist words w', w", t> ', v" such that such that v = u'uu" and w = v'vv" = v'u'uu"v". Let 
<%o,^') = gi, 5(gi,«') = 92, 5(q 2 ,u) = g 3 , 5(q 3 ,u") = g 4 , and 5(g 4 ,w") = g 5 . Moreover, let 
(5(g ,M') = g a , 5(q a ,u) = g&, and 5(qt,,u") = g c , and 5(g ,w) = g a . Since u,w & L, we know 
that g a and gs are accepting states. Since v G" L, we know that g c is not accepting. 

Automaton M' operates as follows. In the initial state [go, go, go, 1] we process the symbols 
of v' using Rule 1, ending in the state [q±, go, go, 1]. At this point, we use Rule 2 to move 
to [gi,g ,go,2] by an e-move. Next, we process the symbols of u' using Rule 3, ending in 
the state [g 2 , q a , go, 2]. Then we use Rule 4 to move to [g2,g a ,go,3] by an e-move. Next, we 
process the symbols of u using Rule 5, ending in the state [q 3 , g^, q a , 3]. Then we use Rule 
6 to move to [g3, g?>, g«, 4] by an e-move. Next, we process the symbols of u" using Rule 7, 
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ending in the state [g 4 , q c , q a , 4]. Then we use Rule 8 to move to [g 4 , q c , q a , 5] by an e-move. 
Finally, we process the symbols of v" using Rule 9, ending in the state [q 5 , q c , q a , 5], and this 
state is in F' . 

On the other hand, suppose M' accepts the input w. Then we must have 5'(q' Q , w)r\F' ^ 0. 
But the only way to reach a state in F' is, by our construction, to apply Rules 1 through 9 in 
that order, where odd-numbered rules can be used any number of times, and even-numbered 
rules can be used only once. Letting v', u', u, u", v" be the words labeling the uses of Rules 1, 
3, 5, 7, and 9, respectively, we see that w = v'u'uu"v", where S(q ,w) G L, 5(q ,u) G L, and 
5(qo,u'uu") G" L. It follows that u,w G L and v = u'uu" G" L, and so L is not factor-convex. 

We have proved 

Theorem 1. If M is a DFA with n states, there exists an NFA-e M' with 0(n 3 ) states and 
transitions such that M' accepts the language 

L(M') = {w G S* : there exist u, v G S* such that u is a factor of v, 
v is a factor of w, and u,w G L,v L}. 

Corollary 2. We can decide if a given regular language L accepted by a DFA with n states 
is factor-convex in 0(n 3 ) time. 

Proof. Since L is factor-convex if and only if L(M') = 0, it suffices to check if L(M') = 
using depth-first search of a directed graph, in time linear in the number of vertices and 
edges of M' . □ 

2.1.1 Factor-closure 

The language L is not factor-closed if and only if there exist words v,w such that v is a 
factor of w, and w G L, while v G" L. 

Given a DFA M accepting L, we construct from M an NFA-e M' such that 

L(M') = {w G X* : there exist v,w G S* such that v is a factor of w, 
and w G L,v G" L}. 

As before, L(M') = if and only if L(M) is factor-closed. The size of M' is 0(n 2 ). 

States of M' are triples, where components 1 and 2 keep track of where M would be 
upon processing w, and v (respectively). The last component is a flag as before. 

Formally, W = (Q', E, 5\ q' , F'), where 

Q' = QxQx {1,2,3}; 
Qo = [9o,5o,l]l 

F' = Fx{Q-F)x {3}; and 
1. 5'(\p,q ,l],a) = {[5(p,a),q ,l]} for p G Q, a G E. 



4 



2. 5'([p, g , 1], e) = {[p, g , 2]}, for all p E Q; 



3. 5'([p, g, 2], a) = {[6(p, a), <%, a), 2]}, for all p,q E Q; 

4. 5'{\p, q, 2], e) = {[p, q, 3]}, for all p,q E Q; 

5. 5'([p, g, 3], a) = {[5(p, a), g, 3]}, for p, q E Q, a E S. 

M' has 2n 2 + n states and (2|S| + l)n 2 + (|E| + 1) transitions. Thus we have: 

Theorem 3. We can decide if a given regular language L accepted by a DFA with n states 
is factor-closed in 0(n 2 ) time. 

This result was previously obtained by Beal et al. pi Prop. 5.1, p. 13] through a slightly 
different approach. 

The converse of the relation ll u is a factor of v" is u v contains u as a factor" . This converse 
relation and similar converse relations, derived from the prefix, suffix, and subword relations, 
lead to "converse-closed languages" [2J . It has been shown by de Luca and Varricchio [S] that 
a language L is factor-closed (factorial, in their terminology) if and only if it is a complement 
of an ideal, that is, if and only if L = for some K C £*. Ang and Brzozowski [2] 

noted that a language is an ideal if and only if it is converse-factor-closed, that is, if, for 
every u G L, each word of the form v = xuy is also in L. Thus, to test whether L is converse- 
factor-closed, we must check that there is no pair (u, v) such that u G L, v G" L, and u is a 
factor of v. This is equivalent to testing whether L is factor-closed. Then the following is an 
immediate consequence of Theorem [TJ 

Corollary 4. We can decide if a given regular language L accepted by a DFA with n states 
is an ideal in 0(n 2 ) time. 

The results above also apply to other converse-closed languages. Similarly, any result 
about the size of witness demonstrating the lack of prefix-, suffix- and subword-closure apply 
also to the witness demonstrating the lack of converse-prefix-, converse-suffix- and converse- 
sub word-closure, respectively. Subword-closed and converse-subword-closed languages were 
also investigated and characterized by Thierrin [TT] . 

2.1.2 Factor-freeness 

Factor-free languages (also known as infix-free) have recently been studied by Han et al. 
[Hj; they gave an efficient algorithm for determining if the language accepted by an NFA is 
prefix-free, suffix-free, or factor-free. 

We can decide whether a DFA language is factor- free in 0(n 2 ) time with the automaton 
we used for testing factor-closure, except that the set of accepting states is now 

F' = Fx F x {3}. 

Similar results hold for prefix-free, suffix-free, and subword-free languages. 
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2.2 Prefix-convexity 

Prefix convexity can be tested in an analogous fashion. We give the construction of M' 
without proof: let M' = (£', E, 5', q' , F'), where 



Q' 

Q'o 
F' 

5'(\p,q,r, l],a) 
5'(\p,q,r, l],e) 
5'(\p,q,r, 2], a) 
S'(\p,q,r,2],e) 
6'(\p,q,r, 3], a) 



QxQxQx {1,2,3}; 

bo, go, l]; 

Fx (Q-F) xFx {3}; 

{[5(p, a), 5(q, a), 5(r, a), 1]} for p, q,r <E Q, a G E; 
{[p,g,r,2]} for p,g,rG<2; 
{[5(p,a),%,a),r,2]} for p,g,rG<2, a G E; 
{[p,g,r,3]} for p,q,reQ; 
{[S(p,a),q,r,3]} for p,g,r G Q, a G E. 



The NFA M' has 3n 3 states and 3(|E| + l)n 3 transitions. 

2.2.1 Prefix-closure 

By varying the construction as before, we have 

Theorem 5. We can decide if a given regular language L accepted by a DFA with n states 
is prefix- closed, suffix-closed, or subword- closed in 0(n 2 ) time. 

2.2.2 Prefix-freeness 
See Section 12.1.21 



2.3 Suffix-convexity 

Suffix-convexity can be tested in an analogous fashion. We give the construction of M' 
without proof. Let M' = (Q', E, 5' } q' , F'), where 

Q' 



QxQxQx {1,2,3}; 

[90,90,90,1]}; 
Fx(Q-F)xFx {3}; 

{[5(p,a),q ,qo,l]} for p G Q, a G E; 
{[P,9o,9o,2]} for pG<2; 
{[5(p, a),5(q, a), q , 2]} for p, g G Q, a G E; 
{[p,9,9o,3]} for p,qeQ; 

{[5{p,a),6(q,a),5(r,a),3]} for p, g, r G Q, a G E. 

The NFA M' has n 3 states and |E|n 3 + (|E| + l)(n 2 + n) transitions. 

For results on suffix-closure and suffix-freeness, see Theorem [5] and Section 12.1.21 respec- 
tively. 



9o 
F 

<*'([p, 9o, 9o,l], a 
<*'([p,9o,9o, l],e 
£'([p, 9,9o,2], a 
^'([p,9,9o,2],e 
5'([p,g,r, 3], a 
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2.4 Subword-convexity 

Subword- convexity can be tested in an analogous fashion. We give the construction of M' 
without proof. Let M' = (Q', S, 8', q' Q , F'), where 

Q' = QxQxQ; 
q'o = bo, go, go]; 

F' = F x (Q - F) x F; 
5'([p,q,r],a) = { [5{p, a) , q, r] , [8(p, a), 8(q, a), r], [8(p, a), 8(q, a), 8(r, a)]}, 
for all p,q,r G Q and a G S. 

The NFA M' has n 3 states and |E|n 3 transitions. 

The idea is that as the symbols of w are read, we keep track of the state of M in the 
first component. We then "guess" which symbols of the input also belong to u and/or v, 
enforcing the condition that, if a symbol belongs to u, then it must belong to v, and if it 
belongs to v , then it must belong to w. We therefore cover all possibilities of words u, v such 
that u is a subword of v and v is a subword of w. 

For results on subword- closure and subword-freeness, see Theorem [5] and Section I2.1.2[ 
respectively. 



2.5 Almost convex languages 

As we have seen, a language L is prefix-convex if and only if there are no triples (u, v,w) 
with u a prefix of v, v a prefix of w, and u,w G L, v G" L. We call such a triple a witness. 
A language could fail to be prefix-convex because there are infinitely many witnesses (for 
example, the language (aa)*), or it could fail because there is at least one, but only finitely 
many witnesses (for example, the language e + aaa*). 

We define a language L to be almost prefix- convex if there exists at least one, but only 
finitely many witnesses to the failure of the prefix-convex property. Analogously, we define 
almost suffix-, almost factor-, and almost subword- convex. 

Theorem 6. Let L be a regular language accepted by a DFA with n states. Then we can de- 
termine if L is almost prefix- convex (respectively, almost suffix- convex, almost factor- convex, 
almost subword- convex) in 0(n 3 ) time. 

Proof. We give the proof for the almost factor-convex property, leaving the other cases to 
the reader. 

Consider the NFA-e M' defined in Section [2TT1 As we have seen, M' accepts the language 

L(M') = {w G X* : there exist u, v G X* such that u is a factor of v, 
v is a factor of w, and u,w G L,v G" L}. 

Then M' accepts an infinite language if and only if L is not almost factor-convex. For if 
M' accepts infinitely many distinct words, then there are infinitely many distinct witnesses, 



7 



while if there are infinitely many distinct witnesses (u,v,w), then there must be infinitely 
many distinct w among them, since the lengths of \u\ and \v\ are bounded by \w\. 

Thus it suffices to see if M' accepts an infinite language. If M' were an NFA, this would 
be trivial: first, we remove all states not reachable from the start state or from which we 
cannot reach a final state. Next, we look for the existence of a cycle. All three goals can be 
easily accomplished in time linear in the size of M', using depth-first search. 

However, M' is an NFA-e, so there is one additional complication: namely, that the cycle 
we find might be labeled completely by e-transitions. To solve this, we use an idea suggested 
to us by Jack Zhao and Timothy Chan (personal communication): we find all the connected 
components of the transition graph of M' (which can be done in linear time) and then, 
for each edge (p, q) labeled with something other than e (corresponding to the transition 
q G 5(p, a) for some a G X), we check to see if p and q are in the same connected component. 
If they are, we have found a cycle labeled with something other than e. This technique runs 
in linear time in the size of the NFA-e. □ 

2.5.1 Almost closed languages 

In analogy with Section |23| we can define a language L to be almost prefix-closed if there 
exists at least one, but only finitely many witnesses to the failure of the prefix-closed property. 
Analogously, we define almost suffix-, almost factor-, and almost subword- closed. 

Theorem 7. Let L be a regular language accepted by a DFA with n states. Then we can 
determine if L is almost prefix- closed (respectively, almost suffix- closed, almost factor-closed, 
almost subword- convex) in 0(n 2 ) time. 

Proof. Just like the proof of Theorem [61 □ 

2.5.2 Almost free languages 

In a similar way, we can define a language L to be almost prefix-free if there exists at least 
one, but only finitely many witnesses to the failure of the prefix-free property. Analogously, 
we define almost suffix-, almost factor-, and almost subword-free. 

Theorem 8. Let L be a regular language accepted by a DFA with n states. Then we can 
determine if L is almost prefix-free (respectively, almost suffix-free, almost factor-free, almost 
subword-free) in 0(n 2 ) time. 

Proof. Just like the proof of Theorem [6j □ 

3 Minimal witnesses 

Let <j represent one of the four relations: factor, prefix, suffix, or subword. A necessary and 
sufficient condition that a language L be not <j-convex is the existence of a triple (u,v,w) 
of words, where u,w G L, v G" L, u<v, and v < w. As before, we call such a triple a witness 
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to the lack of <j-convexity. A witness (u, v,w) is minimal if every other witness (u',v',w') 
satisfies \w\ < \w'\, or \w\ = \w'\ and |u| < \v'\, or |iw| = \w'\, \v\ = \v'\, and \u\ < \u'\. The 
size of a witness is \w\. 

Similarly, if L = L is not <-closed, then (v, w) is a witness if w G L, v ^ L, and v <w. A 
witness (v,w) is minimal if there exists no witness (v',w') such that \w'\ < \w\, or \w'\ = \w\ 
and \v'\ < \v\. The size is again \w\. For <-freeness witness, minimal witness, and size are 
defined as for <j-closure, except that both words are in L. 

Suppose we are given a regular language L specified by an n-state DFA M, and we know 
that L is not ^-convex (respectively, <-closed or <-free). A natural question then is, what 
is a good upper bound on the size of the shortest witness that demonstrates the lack of this 
property? 

3.1 Fact or- convexity 

From Theorem [Tj we get an 0(n 3 ) upper bound for a witness to the lack of factor-convexity. 

Corollary 9. Suppose L is accepted by a DFA with n states and L is not factor-convex. 
Then there exists a witness (u, v, w) such that \w\ < 3n 3 + n 2 + n — 1. 

Proof. In our proof of Theorem (TJ we constructed an NFA-e M' with 3n 3 + n 2 + n states 
accepting L(M r ) = {w G X* : there exist u,v G S* such that (u,v,w) is a witness}. Thus, 
if M is not factor-convex, M' accepts such a word w, and the length of w is clearly bounded 
above by the number of states of M' minus 1. □ 

It turns out that the bound in Corollary [9] is best possible: 

Theorem 10. There exists a class of non- factor- convex regular languages L n , accepted by 
DFA's with 0(n) states, such the size of the minimal witness is Q(n 3 ). 

The proof is postponed to Section 13.31 below. 

Results analogous to Corollary hold for prefix-, suffix-, and sub word- convex languages. 
However, in some cases we can do better, as we show below. 

3.1.1 Factor-closure 

Theorem [3] gives us a 0(n 2 ) upper bound on the length of a witness to the failure of the 
factor-closed property: 

Corollary 11. If L is accepted by a DFA with n states and L is not factor-closed, then there 
exists a witness (v, w) such that \w\ < 2n 2 + n — 1. 

It turns out that this 0(n 2 ) upper bound is best possible. Let M — (Q, E, 5, go, F) be a 
DFA , where Q = {q ,qi, ■ ■ ■ ,q n ,Qn+i,Po,Pi, • • ■ ,Pn,Pn+i}, E = {0, 1}, F = Q \ {q n +i}- For 
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1 < i < n, < j < n, the transition function is 

<%o, 0) 
S(qo, 1) 

%, 0) 



= go, 




= gi, 




-i 




if i < n; 






if z = n, 






if z < n 


= 1 


| Po, 


if z = n 






if z = n 



!) = S Po, ifz' = n-l; 



<K<?n+l,0) 

S(q n +i, 1) 



5(^,0) 



S(Pn+hO) 




The DFA M has 2n + 4 states. For n = 5, M is illustrated in Figure HJ 








Figure 1: Example of the construction in Theorem [12] for n = 5. All unspecified transitions 
go to a rejecting "dead state" q 6 (not shown) that cycles on all inputs. 

Then we have the following theorem: 

Theorem 12. For the DFA M above, let L = L(M). For any witness (u,v) to the lack of 
factor-closure we have \v\ > (n + l) 2 — 1, and this bound is achievable. 
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Proof. Let (u, v) be a minimal witness. Since the only rejecting state q n +i in M leads only 
to itself, all the states along the accepting path of v are final. We claim that u is a suffix 
of v, that is, v = wu for some w. Otherwise, if the last letter of v is not the last letter of 
u, we can just omit it and get a shorter v, which contradicts the minimality of v. Similarly, 
all the states along the rejecting path of u except the last one are final; otherwise, we get a 
shorter u. 

First, we prove that the set of states along the accepting path of v includes both q 
states and p states. Let u = O l lu' for i > 0. Then 5(qi,u') = q n +i- If S(qo,wO t ) is a p 
state, we are done. Otherwise, let S(qo,wO l ) = qj. for some < k < n. If k = n, then 
8(q ,v) = 5(q ,wO l lu') = 5(qp.,lu') = S(q n ,lu') = 5(q n+ i,u') = q n +i, a contradiction. If 
k = n — 1, then 5(q ,wO l l) = 8(qi~,l) = Po, which is a p state. Otherwise, S(q ,v) = 
S(qk, lw') = 5(qi,u') = q n +i, a contradiction. Hence, the set of states along the accepting 
path of v includes both q states and p states. 

Now, consider the set of states along the rejecting path of u. We prove that the set 
of states along the rejecting path of u includes only q states. Suppose it includes both q 
states and p states. Since there is only one transition from a q state to a p state and all 
transitions from a p state to a q state are to the rejecting state q n+ i, we have u = uiu 2 , 
where S(q , Ui) = q n -i, and 

u 2 GL 1 = l(O n+1 )*(e + + 00 + ■ ■ • + O"- 1 )!. 

Since u is a suffix of v, the last letter of v is also 1. So, by the construction of M, we have 
that v = V1V2, where 8(qo,Vi) = q n -i, and 

v 2 EL 2 = l(0 n+1 )*0 n l. 

It is obvious that (E*Li) D {YTL 2 ) = 0, which contradicts the equality v±v 2 = v = wu = 
wu-[U 2 . Therefore, the set of states along the rejecting path of u includes only q states. 
Consider the last block of 0's in the words u and v. By the structure of M, we have 

u e S*i(o")*o"- 1 i, 

and 

v E E*l(0 n+1 )*0 n l. 

Therefore, the length of the last block of 0's is at least n(n + 1) — 1. In other words, 
\u\ > n(n + 1) — 1 + 2 = n 2 + n + 1. Since the shortest word that leads to state 5 n -i 
(which is the only state having a transition to a p state on input 1) is 10 n_2 , we also have 
|u| > 1 + n — 2 + n 2 + n + l = n 2 + 2n, and the first part of this theorem proved. 

To see that equality is achieved, let u = 10™ +ri_1 l and v = 10 n ' 2 u. □ 

3.1.2 Factor-freeness 

From the remarks in Section 12.1.21 we get 
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Corollary 13. If L is accepted by a DFA with n states and L is not factor-free, then there 
exists a witness (v,w) such that \w\ < 2n 2 + n — 1. 

Up to a constant, Corollary [TBI is best possible, as the following theorem shows. 

Theorem 14. There exists a class of languages accepted by DFA's with 0(n) states, such 
that the smallest witness showing the language not factor-free is of size Q(n 2 ). 

Proof. Let L = bb(a™) + b U b(a™ +1 ) + b. This language can be accepted by a DFA with 2n + 6 
states. However, the shortest witness to lack of factor-freeness is (ba n ^ n+1 - ) b, bba n ( n+1 - ) b), 
which has size n 2 + n + 3. □ 

3.2 Prefix-convexity 

For prefix- convexity, we have the following theorem. 

Theorem 15. Let M be a DFA with n states. Then if L(M) is not prefix- convex, there 
exists a witness (u,v,w) with \w\ <2n — l. Furthermore, this bound is best possible, as for 
all n > 2, there exists a unary DFA with n states that achieves this bound. 

Proof. If L(M) is not prefix-convex, then such a witness (u, v,w) exists. Without loss of 
generality, assume that (u, v, w) is minimal. Now write w = uyz, where v = uy and w = vz. 

Let 8(qo,u) = p, 8(p,y) = q, and S(q, z) = r. Let P be the path from q to r traversed 
by uvw, and let Pi be the states from q to p (not including p), P 2 be the states from p to 
q (not including q), and P 3 be the states from q to r (not including r). See Figure [2j Since 
(u,v,w) is minimal, we know that every state of P3 is rejecting, since we could have found 
a shorter w if there were an accepting state among them. Similarly, every state of P2 must 
be accepting, for, if there were a rejecting state among them, we could have found a shorter 
y and hence a shorter v. Finally, every state of Pi must be rejecting, since, if there were an 
accepting state, we could have found a shorter u. 




Figure 2: The acceptance path for w 



Let rj = |Pj| for i = 1,2,3. There are no repeated states in P 3 , for if there were, we 
could cut out the loop to get a shorter w; the same holds for P2 and Pi. Thus r j < n — 1 for 



i = 1,2,3. 
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Now Pi and P 2 are disjoint, since all the states of Pi are rejecting, while all the states 
of P2 are accepting. Similarly, the states of P3 are disjoint from P 2 . So r\ + r 2 < n and 
r 2 + r 3 < It follows that ri + r 2 + r 3 < 2n — r 3 . Since r 3 > 1, it follows that |w| < 2n — 1. 

To see that 2n — 1 is optimal, consider the DFA of n states accepting the unary language 
L = a n_1 (a n )*. Then L is not prefix-convex, and the shortest witness is (a n_1 , a n , a 2n_1 ). □ 

3.2.1 Prefix-closure 

For prefix-closed languages we can get an even better bound. 

Theorem 16. Let M be an n-state DFA, and suppose L = L(M) is not prefix-closed. Then 
the minimal witness (v,w) showing L is not prefix-closed has \w\ < n, and this is best 
possible. 

Proof. Assume that (v, w) is a minimal witness. Consider the path P from q to q = S(qo, w), 
passing through p = 5(qo, v). Let Pi denote the part of the path P from q to p (not including 
p) and P2 denote the part of the path from p to q (not including q). Then all the states 
traversed in P 2 must be rejecting, because if any were accepting we would get a shorter w. 
Similarly, all the states traversed in P 1 must be accepting, because otherwise we could get 
a shorter v. Neither P x nor P 2 contains a repeated state, because if they did, we could "cut 
out the loop" to get a shorter v or w. Furthermore, the states in Pi are disjoint from P 2 . So 
the total number of states in the path to w (not counting q) is at most n. Thus \w\ < n. 

The result is best possible, as the example of the unary language L = (a™)* shows. This 
language is not prefix-closed, can be accepted by a DFA with n states, and the smallest 
witness is (a, a"). □ 

3.2.2 Prefix-freeness 

For the prefix- free property we have: 

Theorem 17. If L is accepted by a DFA with n states and is not prefix- free, then there 
exists a witness (v,w) with \w\ < In — 1. The bound is best possible. 

Proof. The proof is similar to that of Theorem [15j The bound is achieved by a unary DFA 
accepting a n_1 (a n )*. □ 

3.3 Suffix-convexity 

For the suffix-convex property, the cubic upper bound implied by Corollary [9] is best possible, 
up to a constant factor. 

Theorem 18. There exists a class of non-suffix-convex regular languages L n , accepted by 
DFA's with 0(n) states, such the size of the minimal witness is Q(n 3 ). 
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Proof. Let 

L = bbb(a"- 1 ) + U bb(a+aa+--- + a n_1 )(a n )* U b(a n+1 ) + . 
Then L can be accepted by a DFA with 3n + 5 states, as illustrated in Figure [31 




Figure 3: Example of the construction in Theorem [18] for n — 4. All unspecified transitions 
go to a rejecting "dead state" d that cycles on all inputs. 

Suppose (u, v, w) is a witness; then w cannot be a word of the form ba l , because no 
proper suffix of such a word is in L. Also, w cannot be a word of the form bba 4 , because the 
only proper suffix in L is u = ba\ But then there is no word v that lies strictly between u 
and w in the suffix order. So w must be of the form bbba\ The only proper suffixes of w in 
L are of the form bba 2 and ba\ But we cannot have u = bba* because, if we did, there would 
be no v strictly between u and w in the suffix order. So it must be that u = ba\ Then the 
only word in £* strictly between u and w in the suffix order is v = bba 1 , and such a v is not 
in L if and only if % is a multiple of n. On the other hand, for u and w to be in L, i must be 
a multiple of n + 1 and n — 1, respectively. 

It follows that L is not suffix-convex and the shortest witness is (ba l , bba 1 , bbba*), where 
i = lcm(n — 1, n, n + 1) > (n — l)n(n + l)/2. □ 

A similar technique can be used for non-factor-convex languages. This allows us to prove 
Theorem [TUl 

Proof, (of Theorem fTUT) Exactly like the proof of Theorem [TBI except we use the language 
Lb instead. □ 

3.3.1 Suffix-closure 

Obviously, a witness to the failure of suffix- closure is also a witness to the failure of factor- 
closure. So the proof of Theorem [12] shows that the bound (n + l) 2 — 1 also holds for 
suffix-closed languages. 
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Ang and Brzozowski pointed out [2] that a language L is factor-closed if and only if L 
is both prefix-closed and suffix-closed. Here is another relationship concerning the witnesses 
for these properties. 

Proposition 19. Let M be a DFA of n states, and L = L(M). Let v be the shortest word 
such that there is u G" L,v G L,\v\ > n and u is a factor of v. Then u is a suffix of v. 

Proof. Suppose u is not a suffix of v. Write v = v'a for a G S. Then u is also a factor of v' . 
So, v' G" L. Since \v'\ > n, by the pumping lemma, v' = xyz, such that xz G" L, \xz\ < \xyz\. 
But xza G L since xyza = v'a — v G L. This contradicts that v is the shortest. □ 

In other words, a long minimal witness for factor-closure must also be a witness for 
suffix-closure. 

3.3.2 Suffix- freeness 

Theorem 20. There exists a class of languages accepted by DFA's with 0(n) states, such 
that the smallest witness showing the language not suffix-free is of size Q(n 2 ). 

Proof. Let L = bb(a") + U b(a n+1 ) + . This language is accepted by a DFA with 2n + 5 
states. However, the shortest witness to the lack of suffix-freeness (ba. n ^ n+1 \ bba. n ^ n+1 ^) has 
size n 2 + n + 2 . □ 

3.4 Subword-convexity 

We now turn to subword properties. First, we recall some facts about the pumping lemma. If 
w = a\ - ■ ■ a m with a« G £ for 1 < i < m, we write w[i, j] for the factor • • • a-,-. Assume that 
M = (Q, E, S, q , F) is an n-state DFA, m > n, let q G Q, and consider the state sequence 

S(q,w) = (8(q,w[l,0\), . . . ,8(q,w[l,m])). 

We know that some state in S(q, w) must appear more than once, because there are only n 
distinct states in M. Let S(q, w[l, i}) be the first state that appears more than once in S, and 
let x = w[l,i\. Moreover, let S(q, w[l,j}) be the first state in S(q,w) equal to 8(q, w[l, 
and let y = w[i + 1, j]. Finally, let z = w[j + 1, m}. Then w = xyz, where \xy\ < n , \y\ > 0, 
and \z\ > m — n, and 5(q, x) = 5(q, xy). By the pumping lemma, xy*z C L. By the definition 
of x and y, all the states in the sequence S(q, w[l, j — 1]) are distinct. For a word w with 
\w\ = m > n, we refer to the factorization w = xyz as the canonical factorization of w with 
respect to q. 

3.4.1 Subword-closure 

Here v <j w means v is a subword of w. If L = L(M) is not subword-closed, then (v, w) is a 
witness if w G L, v G" L, and v <w. 
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Lemma 21. Let M be a DFA with n > 2 states such that L(M) is not subword- closed. For 
any witness (v,w), there exists a witness (v',w') with \w'\ < n and w' <j w. 

Proof. We will show that, for any witness (v,w) with |iw| > n + 1, we can find a witness 
(v',w') with \w'\ < \w\ and it/ < w. The lemma then follows. 

Suppose that (v,w) is a minimal witness, and |w| = m > n + 1. Then the canonical 
factorization of w is w = xyz, where < n, \y\ > 0, and |z| > m — n > 0. 

If there is a z' such that z' < z and xyz' G" L, then xz' G" L, since xyz' and xz' lead to 
the same state in M. Then (xz',xz) is a witness with \xz\ < \w\ and xz < w. Thus we can 
assume that 

z' < z implies xyz' & L. (1) 

Since v<w = xyz, we can write v = v x v y v z , where v x <x, v y <y, and v z <z. Clearly, v<xyv z . 
If v z 7^ z, then by (jTJ), we have G L, and (v,xyv z ) is a witness with |xyu z | < |iu| and 
xyv z < w. Thus we may assume that our witness has the form (v x v y z,xyz). 

In the particular case that z' — e, (pQ) implies that 6 L. If 3/ < |/ and xy' ^ L, then 
(xy',xy) is a witness with < and xy < w. Thus 

y <y implies xy' G L. (2) 

Finally, if x' <x and x' G" L, then (x', x) is a witness with \x\ < \w\ and x <w. Thus 

x' < x implies x G L. (3) 

Altogether, we may assume that all the states along the path spelling w in M are accepting. 
We know that the states in the sequence 

S = (5(q , w[l, 0]), . . . , 5(q ,w[l, \xy\ - 1])) 

are all distinct. Also, the states in the sequence 

S' = (5(q , v x v y z[l, 1]), . . . , S(q , v x v y z[l, \z\ - 1])) 

are all accepting and distinct; otherwise, v would not be shortest. 

We now claim that no state can be in both S and 5". For suppose that S(qo, w [1, i]) = 
S(qo, v x v y z[l, k}), for some < i < \x\, < k < \z\. Then (w[l, i]z[k + 1, |z|], xz) is a witness 
with \xz\ < \w\ and xz < w, since = x[l,z], and x[l,z]z[A; + 1, \z\] < xz. Next, if 

S(qo, xy[l,j]) = S(qo,v x v y z[l,k]), for some < j < \y\, < k < \z\, then 

(xy[l,j]z[k + 1, \z\],xyz[k + 1, \z\]) 

is a witness with \xyz[k + 1, |z|]| < \w\ and xyz[k + 1, \z\] < w, since xy[l, j]z[k + 1, \z\] < 
xyz[k + 1, \z\], and xyz[A; + 1, \z\] G L by ([T]). 

Under these conditions M must have \xy \ + (\z\ — 1) = \xyz \ — 1 distinct accepting states, 
and at least one rejecting state. Hence \xyz\ = \w\ < n and we have found a witness with 
the required properties. □ 
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Corollary 22. Let M be a DFA with n > 2 states. If L(M) is not subword- closed, there 
exists a witness (v,w) with \w\ < n. Furthermore, this is the best possible bound, as there 
exists a unary DFA with n states that achieves this bound. 

Proof. If L is not subword-closed then it has a witness and, by Lemma [21], it has a witness 
(v, w) with \w\ < n. This is the best possible bound for n > 2, since the language 

(a"')*(e + a+--- + a n - 2 ), 

accepted by a DFA with n states, has a minimal witness (a n_1 , a"). □ 

For n = 1, L is either or £*, and both of these languages are subword-closed. 

3.4.2 Subword-freeness 

Lemma 23. Let M be a DFA with n > 2 states such that L(M) is not subword-free. For 
any witness (u,w), there exists a witness (u',w') with \w'\ < 2n — 1, and w' < w. 

Proof. We will show that, for any witness (u, w) with \w\ > 2n, we can find a witness (u', w') 
with \w'\ < \w\ and w' < u;. The lemma then follows. 

Let the canonical factorization of w with respect to go be w = xyz, where \xy\ < n, 
\y\ > 0, and \z\ > n > 0. Then we also have a canonical factorization of z — x'y'z' with 
respect to state q = 5(qo,xy), where \x'y'\ < n, \y'\ > 0, and \z'\ > 0. Now we have a witness 
(xx'z', xx'y'z') = (xx'z',xz) with \xz\ < \w\ and xz < w. □ 

Corollary 24. Let M be a DFA with n > 2 states. If L(M) is not subword-free, there exists 
a witness (u,w) with \w\ < 2n — 1. Furthermore, this is the best possible bound, as there 
exists a unary DFA with In — 1 states that achieves this bound. 

Proof. If L is not subword-free then it has a witness and, by Lemma [231 h has a witness 
(i> , w ) with |w| < 2n — 1. This is the best possible bound for n > 2, since the language 
a rt_1 (a ?1 )*, accepted by a DFA with n states, has a minimal witness (a n_1 , a 2 ™" 1 ). □ 

For n — 1, L is either or S*. Only S* is not subword-free, and has a minimal witness 
(e, a) for any a G E. 

3.4.3 Subword-convexity 

Lemma 25. Lei M be a DFA with n > 2 states such that L(M) is not subword- convex. For 
any witness (u, v, w), there exists a witness (vf, v', w') with w' <j u> ; and < 3rz — 2. 

Proof. We will show that, for any witness (u, v, w) with \w\ > 3n — 1, we can find a witness 
(u',v',w') with | it/ 1 < \w\ and < if. The lemma then follows. 

We may assume without loss of generality that v is a shortest possible word corresponding 
to the given w, and u is a shortest word corresponding to v and w. 

First, consider the witness (u,v) for lack of subword- closure of the language L. By 
Lemma [2T| there exists a witness (u',v f ) to the failure of the subword-closure property of L 
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such that v' < v and \v'\ < n. Therefore we can assume that we have a witness (u,v,w) to 
the failure of subword- convexity such that |i>| < n. 

Suppose that (u,v,w) is a minimal witness, and \w\ > 3n — 1. Then the canonical 
factorization of w is w = x\y\Z\, where < n, \yi\ > 0, and \z±\ > 2n — 1 > n > 0. 

Consider the states 

Po = 6(q ,x 1 y 1 ), Pi = 6(q ,xiyiz 1 [l,l]), p\ Zl \ = 6(q , x x y x z x ). 

Since \z\\ > n, there must be at least one pair (pi,Pj) of states such that pi = pj. If po 
is the state that is repeated, let i be the greatest index such that po = Pi, and let X2 = e, 
y 2 = zi[l, i], and z 2 = z\[i + l, \zi\]. If Pi is the first state that is repeated, let j be the greatest 
index such that pi = pj, and let x 2 = zi[l,i], y 2 = z\[i + l,j], and z 2 = z\\j + 1, \z\\\. If 
5{%,x 1 yiX 2 y 2 ),8(qQ,x x y 1 x 2 y 2 z 2 [l, 1]), . . . ,S(q ,Xiy 1 x 2 y2Z2) has no repeated states, we stop. 
Otherwise, we apply the same procedure to z 2 , and so on. In any case, eventually we reach 
a z k for which no repeated states exist. Then we have the factorization 

w = x 1 y 1 x 2 y 2 ■ •■x k y h z k , 

where X\y\x 2 y\ ■ ■ ■ x k y\z k C L, \x 2 ■ ■ -x k z k \ < n (otherwise, there would be repeated states), 
\yi\ > 0, for i = 1, . . . , k, and k > 2. 

For any y' 2 <y 2 ,--- ,y' k ^ yk, we have x\y\x 2 y' 2 ■ ■ ■ x k y' k Zk € L. Otherwise, the triple 

(x ± x 2 ■ ■ ■ x k z k , xix 2 y 2 ■ ■ ■ x k y' k z k , x x x 2 y 2 ■ ■ ■ x k y k z k ) 

is a witness with \x\x 2 y 2 ■ ■ ■ x k y k z k \ < \w\, and x±x 2 y 2 ■ ■ -x k y k z k < w. 

Since v < w, we can now write v = v xl v yi v X2 v y2 ■ ■ ■ v Xk v Vk v Zk , where v xi < x±, etc. If there 
is a with % > 2, such that i> yi = e, then we can replace that by e in to and obtain a 
smaller witness. Hence each v yi must be nonempty. By the same argument, if there is a 
letter in y iy for i > 2, that is not used in then that letter can be removed, yielding a 
smaller witness. Therefore yi — v yi for i — 2, . . . , fc. We claim that |y2 • • • Uk\ < \v\; otherwise 
v = v y2 ■ ■ ■ v Vk = y 2 ■ ■ ■ y k and (u, v, x x x 2 y 2 ■ ■ ■ x k y k z k ) is a witness with \x x x 2 y 2 ■ ■ ■ x k y k z k \ < 
\w\. Thus \y 2 • • • y k \ < \v\ < n, and 

\w\ = \xiyi\ + \x 2 • • -x k z k \ + \y 2 • • - y k \ < n + (n — 1) + (n — 1) = 3n - 2. 

□ 

Corollary 26. Let M be a DFA with n > 2 states. If L(M) is not subword- convex, there 
exists a witness (u,v,w) with \w\ < 3n — 2. 

We do not know whether 3n — 2 is the best bound. The unary language a™" 1 (a™)* is 
accepted by a DFA with n states and has a minimal witness (a™ -1 , a n , a 2 ™ -1 ), showing that 
2n — 1 is achievable. 

4 Languages specified by NFA's 

In this section consider some of the same problems as we have for DFA's in previous sections. 



18 



4.1 Deciding convexity for NFA's 



Our main result is that some of our decision problems become PSPACE-complete if M is 
represented by an NFA. Our fundamental tool is the following classical lemma p]: 

Lemma 27. Let T be a one-tape deterministic Turing machine and p(n) a polynomial such 
that T never uses more than p(\x\) space on input x. Then there is a finite alphabet A 
and a polynomial q(n) such that we can construct a regular expression r x in q(\x\) steps, 
such that L(r x ) = A* if T doesn't accept x, and L(r x ) = A* — {w} for some nonempty w 
(depending on x) otherwise. Similarly, we can construct an NFA M x in q{\x\) steps, such 
that L(M X ) = A* if T doesn't accept x, and L(M X ) = A* — {w} for some nonempty w 
(depending on x) otherwise. 

Theorem 28. The problem of deciding whether a given regular language L, represented by 
an NFA or regular expression, is prefix-convex (resp., suffix-, factor-, subword- convex) , or 
prefix-closed (resp., suffix-, factor-, subword- closed) is PSPACE-complete. 

Proof. We prove the result for factor-convexity, the other results being proved in the same 
way. 

First, let's see that the problem of deciding factor-convexity is in PSPACE. We actually 
show that we can solve it in NSPACE, and then use Savitch's theorem that PSPACE = 
NSPACE. 

Suppose L is accepted by an NFA M with n states. Then, by the subset construction, L 
is accepted by a DEA with < 2 n states. From Theorem [2] above, we see that if L is not factor- 
convex, we can demonstrate this by exhibiting u, v, w with u a prefix of v and v is a prefix of 
w , and u,w G L, v L and then checking that these conditions are fulfilled. Furthermore, 
from Corollary [9], if such u,v,w exist, then \u\, \v\, \w\ = 0((2 n ) 3 ). In polynomial space, we 
can count up to 2 3ra . Write w = xiX 2 x 3 X4X 5 , and let v = 22X3X4 and u = x 3 . We use boolean 
matrices to keep track of, for each state of M, what state we would be in after reading 
prefixes of w. We guess the appropriate words x±, X2, £3, X4, x$ symbol-by-symbol, using a 
counter to ensure these words are shorter than 2 3n . We then verify that X3 and X1X2X3X4X5 
are in L and X2X3X4 is not. 

Now let's see that the problem is PSPACE-hard. Since A* is factor-convex and A* — {w} 
is not if w ^ e, we could use an algorithm solving the factor-convex problem to solve 
decidability for polynomial-space bounded Turing machines. □ 

However, the situation is different for deciding the property of prefix-freeness, suflix- 
freeness, etc., for languages represented by NFA's, as the following theorem shows. This 
result was proved already by Han et al. [8] through a different approach. 

Theorem 29. Let M be an NFA with n states and t transitions. Then we can decide in 
0{n 2 + t 2 ) time whether L(M) is prefix-free (resp., suffix-free, factor-free, subword-free). 

Proof. We give the full details for prefix-freeness, and sketch the result for the other three 
cases. 
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Given M = (Q,H,5,qo, F), create an NFA M' accepting L(M)E + . This can be done, 
for example, by adding a transition on each a G E from each old final state of M to a new 
state qf, and having a loop on qf to itself on each a G E. Finally, let the new set of final 
states for M' be {q f }. Clearly that L(M) is prefix-free if and only if L(M) n L(M') = 0. 
We can construct an NFA M" accepting L(M) n L(M') using the usual "direct product" 
construction. If the original M had n states and t transitions, the new M' has n + 1 states 
and at most t + 2n|E| transitions. So M" has n(n + 1) states and at most t(t + 2n|E|) 
transitions. Since without loss of generality we can assume that t > n — 1 (otherwise M is 
not connected), it costs 0{n 2 + t 2 ) to check whether L(M") = using depth-first search. 

For suffix-freeness, we carry out a similar construction for L(M) fl E + L(M). For factor- 
freeness, we carry out a similar construction for L(M) fl (E + L(M)E* U E*L(M)E + ). 

For subword-freeness, we carry out a similar but slightly more involved construction, 
which is as follows: create M' by making two copies of M. Add a transition from each state 
q to its copy q' on each letter of E, and add transitions from each copy q' to itself on each 
letter of E. The final states of M' are the final states in the part corresponding to the copied 
states. Formally, M' = (Q U Q', E, 5, qF') where Q' = {</ : q G Q}, F' = {g ; : g G F}, 
and a) = 5(q, a) U {g'} for all q e Q, a G E, and <5'(g', a) = a)' U {</} for all 
q G Q, a G E. Then M' accepts the language of all words that are strict superwords of words 
accepted by M. We now create the NFA for L(M) n L(M') as before. □ 



4.2 Minimal witnesses for NFA's 

We have already seen that the length of the minimal witness for the failure of the convex or 
closed properties is polynomial in the size of the DFA. For the case of NFA's, however, this 
bound no longer holds. 

Theorem 30. There exists a class of NFA's with 0(n) states such that the shortest witness 
to the failure of the prefix-convex (resp., suffix- convex, factor- convex, sub word- convex) or 
prefix-closed (resp., suffix-closed, factor-closed, subword- closed) property is of length 2^ n \ 

Proof. In Ellul et al. [HI §5, p. 433] the authors show how to construct a regular expression 
E of length 0(n) that accepts all words up to some length 2 n( * n \ at which point a string 
is omitted. From E one can construct an NFA with 0(ri) states accepting an L with the 
desired property. □ 

For the prefix- free, etc., properties, we have 

Theorem 31. There exists a class of languages, accepted by NFA's with 0(n) states and 
0(n) transitions, such that the minimal witness for the failure of the prefix-free property is 
of length Q(n 2 ). 

Proof. For non-prefix- free, we can use the reverse of the language defined in the proof of 
Theorem [201 □ 



For the failure of the subword-free property, however, we cannot improve the bound we 
obtained for DFA's in Corollary [241 as the proof we presented there also works for NFA's. 
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5 Languages specified by context-free grammars 



If L is represented by a context-free grammar, then the decision problems corresponding 
to convex and closed languages become undecidable. This follows easily from a well-known 
result that the set of invalid computations of a Turing machine is a CFL [9j Lemma 8.7, p. 
203]. 

Similarly, the decision problems corresponding to the properties of prefix-free, suffix-free, 
and factor-free become undecidable for CFL's, as shown by Jiirgensen and Konstantinidis 
PH Thm. 9.5, p. 581]. 

However, testing subword-freeness is still decidable for CFL's: 

Theorem 32. There is an algorithm that, given a context-free grammar G, will decide if 
L(G) is subword-free. 

Proof. If L = L(G) is infinite, then L is not subword-free by the pumping lemma. For if \w\ 
is sufficiently large, then we can factor w as uvxyz, where \vy\ > 1, such that uxz G L. But 
uxz is a subword of w. We can test if L(G) is infinite by a well-known result [9], Thm. 6.6, 
p. 137]. Otherwise, if L(G) is finite, we can enumerate all its words and test each for the 
subword-free property. □ 

6 Conclusions 

We have shown that we can decide in 0(n 3 ) time whether a language specified by a DFA is 
prefix-, suffix-, factor-, or subword-convex, and that the corresponding closure and freeness 
properties can be tested in 0(n 2 ) time. If the language is specified by an NFA or a regular 
expression, these problems are PSPACE-complete. 

Our results about the sizes of minimal witnesses for the various classes are summarized 
in Table [TJ All results are known to be best possible, except the 3n — 2 upper bound for 
subword-convexity; in this case, we do not know whether the bound is achievable. 



Table 1: Sizes of witnesses 



property 
relation 


convexity 


closure 


freeness 


factor 


Q(n 3 ) 


&(n 2 ) 


&(n 2 ) 


prefix 


In - 1 


n 


2n - 1 


suffix 


Q(n 3 ) 


e(n 2 ) 


Q(n 2 ) 


subword 


3ra- 2 


n 


In - 1 
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